The occupation of a box 
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We illustrate how a simple statistical model can describe the quasiperiodic occurrence of large 
earthquakes. The model idealizes the loading of elastic energy in a seismic fault by the stochastic 
filling of a box. The emptying of the box after it is full is analogous to the generation of a large 
earthquake in which the fault relaxes after having been loaded to its failure threshold. The duration 
of the filling process is analogous to the seismic cycle, the time interval between two successive large 
earthquakes in a particular fault. The simplicity of the model enables us to derive the statistical 
distribution of its seismic cycle. We use this distribution to fit the series of earthquakes with 
magnitude around 6 that occurred at the Parkfield segment of the San Andreas fault in California. 
Using this fit, we estimate the probability of the next large earthquake at Parkfield and devise a 
simple forecasting strategy. 
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I. INTRODUCTION 

In reporting the mechanism of the great California 
earthquake of 1906, Reio- presented the elastic rebound 
theory. It assumes that an earthquake is the result of 
a sudden relaxation of elastic strain by rupture along a 
fault (a rupture surface between two rock blocks that 
move past each other). This theory extended earlier in- 
sights into the relation between earthquakes and faults 
by other geologists,- especially Gilbert)^ McKay^ and 
Koto/' Since its formulation, it has been the basis for 
interpreting the earthquakes that occur in faults in the 
Earth's upper, fragile crust. 

According to Reid's theory, elastic energy slowly accu- 
mulates on a fault over a long time after the occurrence 
of an earthquake, as the rock blocks on both sides of the 
fault are strained by tectonic forces. When the strain is 
large enough, the system relaxes by fast rupture and/or 
frictional sliding along the fault during the next earth- 
quake. The clastic waves generated by this sudden event 
are the seismic waves that seismometers detect. 

The tectonic loading and relaxation process of a fault 
is cyclic. The seismic cycle is the time interval between 
two successive large earthquakes on the same fault, fre- 
quently called characteristic earthquakes.- If the seismic 
cycle were periodic, earthquake prediction would be easy. 
There is increasing information about earthquake occur- 
rences in the seismic record, compiled with historical data 
and recognition of ancient large earthquakes on faults. 2 
These data show that the seismic cycle of any given fault 
is not strictly periodic. The reason is that the tectonic 
loading and relaxation of a fault are complex nonlinear 
processes. 8 Moreover, faults occur in topologically com- 
plex networks, 9 and an earthquake occurring in a fault 



influences what occurs in other faults^ 

The duration of the seismic cycle is not constant, but 
follows a statistical distribution that can be empirically 
deduced from the earthquake time series^ This distri- 
bution, if it were known, could be used to estimate the 
probability of the next earthquake. However, it is not 
well known, because there are little data (typically less 
than ten) in the earthquake time series for any given fault 
or fault segment^ To use this probabilistic approach, it 
is convenient to fit the data to a theoretical statistical 
distribution. 

Especially since the 1970s,i 2 ^^± 6 -'± 7 ->±£ earth- 
quake recurrence is frequently considered as a renewal 
process ; 19 ! 20 in which the times between successive 
events, in this case the large earthquakes in a fault, are 
assumed to be independent and identically distributed 
random variables. In this interpretation, the expected 
time of the next event does not depend on the details 
of the last event, except the time it occurred. In com- 
bination with elastic rebound theory, the probability 
of another earthquake would be low just after a fault- 
rupturing earthquake, and would then gradually increase, 
as tectonic deformation slowly stresses the fault again. 
When an earthquake finally occurs, it resets the renewal 
process to its initial state. Several well-known statistical 
distributions (such as the gammapl log-normal^ and 
Weibul l 16 i 21 ' 23 i 24 ) have been used to describe the dura- 
tion of the seismic cycle and to calculate the conditional 
probabilities of future earthquakes. These distributions 
also have been used as failure models for reliability and 
time-to-failure problems.— 

More recently, many numerical models have been de- 
vised for simulating the tectonic processes occurring on 
a seismic fault . 26 ' 27 These models can generate as many 
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synthetic earthquakes as desired^ so the statistical dis- 
tribution of the time intervals between them can be 
fully characterized.— Two highly idealized models are 
the Brownian passage time model^ and the minimal- 
ist modeh 31 ' 32 ' 33 Their seismic cycle distributions have 
been used as renewal models, to fit actual earthquake se- 
ries and estimate future earthquake probabilities . 30 ' 33 ! 34 
They, as well as the gamma, log-normal and Weibull dis- 
tributions, provide a reasonably good fit to the existing 
data . 21 ' 33 ' 34 The renewal models have been widely ap- 
plied, particularly in Japan 3 — and in the United States, 36 
to estimate the probabilities of the next large earthquake 
for particular faults. 

This paper aims to explain how a renewal model can 
fit the series of seismic cycles in a particular fault, and 
how it can be used to estimate the probability of the next 
large earthquake in the fault. For this purpose we will use 
the process of stochastic occupation of a box to visualize 
the progressive loading of a seismic fault. This box model 
will be used to fit the series of characteristic earthquakes, 
with magnitude around 6, which have occurred on the 
Parkfield segment of the San Andreas fault in California. 

In Sec. |TT]we present the data of the Parkfield series, 
and compute its mean, standard deviation and aperiod- 
icity (coefficient of variation). The presentation of these 
data is important for appreciating the design and tun- 
ing of the subsequent model. Section IIIII is devoted to a 
detailed presentation of the box model. In Sec. IIVI the 
parameters of the model will be tuned to fit the Parkfield 
data series. The comparison between the model and the 
data is made in Sec. [Vj and the annual probability of 
occurrence of the next large shock at Parkfield is cal- 
culated. In Sec. I VII we introduce a simple forecasting 
strategy for the box model and illustrate its effectiveness 
for the Parkfield sequence. In Sec. I VIII we present our 
conclusions. 



II. THE PARKFIELD SERIES 



mean of this series is 



rn 



1 6 

-^c, = 24.62 yr, 



(1) 



and its sample standard deviation (square root of the 
bias-corrected sample variance) is 



1 6 



1/2 



9.25 yr. (2) 



The coefficient of variation, or aperiodicity, is 

a = — = 0.3759. (3) 

m 



III. THE BOX MODEL 

In this section we will introduce a renewal model based 
on a simple cellular automaton. Cellular automata mod- 
els are frequently used to model earthquakes and other 
natural hazards.— These models evolve in discrete time 
steps, and consist of a grid of cells, where each cell can 
be only in a finite number of states. Each cell's state is 
updated at each time step according to rules that usually 
depend on the state of the cell or that of its neighbors 
in the previous time step. For example, a grid of cells 
can represent a discrctizcd fault plane, and the rules can 
be designed according to certain friction lawSj2i and in- 
clude stress transfer— i 41 i 42 and the mechanical effects of 
fluids^ 3 - In the simplest models ; 31 ' 44 these details are ig- 
nored: the model is driven stochastically, there are only 
two possible states for each cell, and the earthquakes are 
generated according to simplified breaking rules. The 
model proposed here is of this last type. It is simple, 
easy to describe analytically, and generates a temporal 
distribution of seismic cycles comparable to that of an 
actual fault. 



The San Andreas fault runs for 1200 km, from the Gulf 
of California (Mexico) to just north of San Francisco, 
where it enters the Pacific Ocean. Fortunately, it does 
not slide or break in its whole length as a single earth- 
quake. Rather, as for other long faults, each earthquake 
ruptures only one or a few sections (segments) of its 
length. During the last century and a half, several earth- 
quakes with magnitude around 6 have occurred along a 
35 km-long segment of the San Andreas fault that crosses 
the tiny town of Parkfield, CA. The apparent tempo- 
ral regularity of this series has lead to extensive seismic 
monitoring in the area . 37 ' 38 Including the most recent 
event, this Parkfield serie o 37 ' 39 consists of seven shocks 
which occurred on 9 January 1857; 2 February 1881; 3 
March 1901; 10 March 1922; 8 June 1934; 28 June 1966; 
and 28 September 2004. The durations (in years) of the 
six observed seismic cycles are c\ = 24.07, c-i — 20.08, 
c 3 = 21.02, c 4 = 12.25, c 5 = 32.05, and c 6 = 38.25. The 



A. The rules 

Consider an array of N cells. The position of the cells 
is irrelevant, but we can assume that they are arranged 
in the shape of a box (see Fig. []}. At the beginning of 
each cycle, the box is completely empty. At each time 
step, one ball is thrown, at random, to one of the cells 
in the box. That is, each cell has equal probability, 1/N, 
of receiving the ball. If the cell that is chosen is empty, 
it will become occupied. If it was already occupied, the 
thrown ball is lost. (Thus, each cell can be either occu- 
pied by a ball or empty.) When a new throw completes 
the occupation of the N cells of the box, it topples, be- 
coming completely empty, and a new cycle starts. The 
time elapsed since the beginning of each cycle, expressed 
by the number of thrown balls, will be called n. The du- 
ration of the cycles is statistically distributed according 
to a discrete distribution function P/y(n). 
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FIG. 1: Sketch of the box model. Balls are thrown at random 
until all the cells of the box are full. Then the box is emptied 
and a new cycle starts. 



The box represents the area of the fault or fault seg- 
ment, and the random throwing of balls represents the 
increase of regional stress. This randomness is a way of 
dealing with the complex stress increase on actual faults. 
The occupation of a cell by a ball stands for the elastic 
strain induced by the stress in a patch or element of the 
fault plane. The loss of the balls that land on already oc- 
cupied cells mimics stress dissipation on this plane. The 
total elastic strain (or conversely the total potential elas- 
tic energy) accumulated in the fault is represented by the 
number of occupied cells. This number gradually grows 
up to the limit N (analogous to the failure threshold of 
the fault), and the toppling of the box represents the oc- 
currence of the characteristic earthquake in the fault. It 
is easy to simulate the evolution of this system with a 
Monte Carlo approach. 

This model is similar to that introduced by Newman 
and Turcotte in Ref. |3 The difference is that their 
model is a square grid of cells in which the topology is 
relevant: they consider that the characteristic earthquake 
occurs when a percolating cluster— spans the grid. This 
cluster happens before the grid is completely full. 



B. Some formulas of the box model 

To describe the box model analytically, it is conve- 
nient to recall some elements of the geometric distribu- 
tion. Consider the probability that exactly x indepen- 
dent Bernoulli trials, each with a probability of success 
p, will be required until the first success is achieved. The 
probability that (x—1) failures will be followed by a suc- 
cess is (1 — p) x ~ 1 p. The resulting probability function, 



P< 



(4) 



is known as the geometric distribution. Its mean and 
variance are 



(x) 



1 

P 



p2 



(5) 



Now we deal with the box model further. In each cycle, 
the filling of the box proceeds sequentially and continues 
until the Nth cell is occupied. Because each of these se- 
quential steps is an independent process, the mean num- 
ber of throws to completely fill the box will be 



(n) N = (xi) N + (x 2 )n H h (xn)n, 



(6) 



where (xi) n is the mean number of throws it takes to fill 
the ith cell. 

Because the filling of the ith cell is geometrically dis- 
tributed with pi — (N + 1 — i)/N, it follows that 



{Xi)N 

and therefore 



N 



N + 1 —i 



N 



(n) 



N 



(z = l,2,...,A0 



N 



N + l-i 



(7) 



(8) 



Relations similar to Eqs. and © can be written 
for the variance of the number of thrown balls to fill the 
box, namely 



crj + 4 



N 1- 

°+E 

i=2 



N+l-i 
N 

N + l-i 
N 



(9) 



and consequently, the standard deviation is 



N 

E 



N(i - 1) 
(N + l-i) 2 



1/2 



(10) 



The aperiodicity of the series, a^r, for a given N is 



aN 



(n) 



N 



(11) 



The mean and the standard deviation of the box model 
can be calculated by summing the N — 1 terms of Eqs. ([5]) 
and (fT0| . For N > 10, these two equations can be 
approximated (with an absolute error < 0.01) by their 
asymptotic expressions^ 



(n) N > N{C + lnN) + 

N—>oc 2 



where C ~ 0.5772157 is Euler's constant, and 



cat 



-» N 



1 + C + miV 
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N 
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(12) 



(13) 



and the aperiodicity can be estimated by using Eq. (|11|) 
with Eqs. (T2]) and (fn 
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The function Pn (n) is not as easy to obtain as its mean 
and standard deviation, and is given by 

w^eW^;^! -£)""', (i4, 

and the accumulative probability function, A^(n): 

n 

A N (n) = Y,PnU)= (15) 

We have deduced Eq. fM]) by means of a Markov chain 
approach analogous to the one used in Refs. HH and [32|. 
This derivation is omitted here because of its length. 

IV. FITTING THE PARAMETERS OF THE 
BOX MODEL 



t ' 1 ' 1 ' r 




Total time steps per cycle, n 



FIG. 2: Discrete distribution function for the duration (in 
time steps, n) of the seismic cycle in the box model with 
AT = 11. 



We will fit the Parkfield series to the accumulative 
probability function, Eq. (fT5|) . using the method of 
moments^ Another method that could be used is that 
of maximum likelihood. — We have seen in Sec. |ffl] that 
the aperiodicity in the box model depends only on N. 
Thus, we will choose N for which the aperiodicity is the 
closest to that of the Parkfield series, that is, a ~ 0.3759. 
The result is N = 11, for which, from Eq. (jTTj) . the ape- 
riodicity is a = 0.3752. 

From Eq. ([8]) the mean value of n for JV = 11 is 
(n)]y = ii — 33.22. Because the actual mean of the Park- 
field series is m — 24.62 yr, one ball throw in the model 
is equivalent to r = 24.62 yr/33. 22 = 0.74 yr « 9 months. 
The discrete distribution function for the duration of the 
seismic cycle in a box model with N = 11, Pn(n) is 
shown in Fig. [21 

In Fig. [3] we plot the evolution of the number of occu- 
pied cells for ten cycles with N = 11. Note the fluctua- 
tions in the duration of the cycles, which are consistent 
with the mean and the standard deviation of the series. 
Note also that the occupation increases rapidly just after 
a toppling, and then slows down. This increase is due to 
the fact that, as a cycle progresses, there are more occu- 
pied cells, and thus it is less probable for an incoming ball 
to land on an empty cell. If p n is the fraction of occupied 
cells at time step n, there is a probability 1 — p n for the 
next ball to be thrown to an empty cell. Because such a 
throw would increase p by 1/iV, the mean p at step n + 1 
is 

(Pn+l) = <Pn) + i[l-(p„)]. (16) 

The box is empty at the beginning of the cycle (po = 0), 
so from Eq. p6[) . the mean value of p n is 



which approaches one asymptotically. 

In real faults, the strain also does not increase uni- 
formly during the seismic cycle. Instead, it follows a 
trend similar to that of the number of occupied cells in 
the box model: the loading rate is faster just after a large 
earthquake, and decreases over time 4^ 

The relaxation of a real fault by means of a large 
earthquake reduces the stress in the system. Thus a 
minimum time has to elapse before the fault accumulates 
enough stress to produce the next large earthquake. 
This effect is called stress shadow. 10 In the box model 
there exists a stress shadow: a characteristic earthquake 
cannot occur until the Nth step in the cycle. According 
to the box model, this minimum time for the Parkfield 
series is tN ~ 8 yr. 
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FIG. 3: Plot of the number of occupied cells during ten cycles 
of a box model with N = 11. 
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EARTHQUAKE PROBABILITIES AT 
PARKFIELD 



We now evaluate the quality of the box model fit for 
the Parkfield series and estimate the probability of the 
next earthquake in this fault segment. In Fig. QJa), the 
empirical distribution function of the Parkfield series is 
plotted. It is an accumulative step function ranging from 
to 1.0, with a jump 1/6 at each of the six observed 
recurrence intervals Cj. The accumulated distribution of 
the box model in Eq. (fl5|) for N — 11 with t = 0.74yr 
also is drawn. In Fig. S£b), we show the residuals of the 
fit, which do not surpass 7.5%. The equivalent fits to 
these data, made by using the renewal models cited in 
Sec. U give very similar results^ 
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FIG. 4: (a) Fit of the accumulative distribution of the box 
model to the accumulated histogram of the Parkfield earth- 
quake sequence, (b) Residuals of the fit, evaluated at the 
midpoints of the horizontal segments of the accumulated his- 
togram. 



to actual years, nr + to, where in is the calendar year at 
which the last earthquake occurred (to — 2004.75 for the 
Parkfield series). In Fig. [5] we plot the yearly probability 
for the new cycle at Parkfield according to the box model. 
During the first eight years after the last earthquake at 
Parkfield (which occurred in September 2004), the box 
model indicates that another big shock should not be 
expected. From that time on, the probability of the next 
earthquake increases, tending to a constant equal to 11%. 
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FIG. 5: Yearly probability of the next characteristic earth- 
quake at Parkfield, according to the box model. 



In the seismological literature there is a well-known 
question about the yearly probability for a time much 
longer than the mean value of the series^ "The longer 
it has been since the last earthquake, the longer the ex- 
pected time till the next?" Sornette and Knopoff 4 - have 
discussed some statistical distributions that lead to af- 
firmative, negative, or neutral answers to it. The re- 
sult shown in Fig. \5\ leads us to conclude that the box 
model produces a neutral answer. The reason is that for 
a long cycle duration (large n), the Pjv(«) of the box 
model decays exponentially, and asymptotically the box 
model behaves as a Poisson model, in which the condi- 
tional probability of occurrence of the next earthquake is 
a constant. 



VI. A SIMPLE FORECASTING STRATEGY 



Now we calculate the yearly probability of the next 
earthquake, that is, the conditional probability of the 
next shock occurring in a certain year, given that it has 
not occurred previously. For the box model it is 



P T (N, n) 



A N (n + l/T)~A N (n) 
l-Ajv(n-l) 



(18) 



Note that 1/r is the number of time steps of the box 
model corresponding to one year. After calculating P T 
from Eq. (fT5)) , it is necessary to rescale the abscissas, n, 



In earthquake forecasting an "alarm" is sometimes 
turned on when it is estimated that there is a high prob- 
ability for a large earthquake to occur If a large shock 
takes place when the alarm is on, the prediction is con- 
sidered to be a success. If it takes place when the alarm 
is off, there has been a failure to predict. The fraction 
of errors, f e , is the number of prediction failures divided 
by the total number of large earthquakes. The fraction 
of alarm time, f a , is the ratio of the time during which 
the alarm is on to the total time of observation. A good 
strategy of forecasting must produce both small f e and 
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f a , because both the prediction failures and the alarms 
are costly. Depending on the trade-off between the costs 
and benefits of forecasting— we can try to minimize a 
certain loss function, L. For simplicity, we will consider 
a simple loss function defined as 

L = .fa+ U (19) 

A random guessing strategy (randomly turning the 
alarm on and off) will yield L — 1, a result which can 
be easily understood. The alarm will be on, randomly, 
during a certain fraction of time, f a . Thus, there will be 
a probability equal to f a for it being on when an earth- 
quake eventually occurs (and a probability of 1 — f a for 
it being off). The result is that f e = l — f a . As a trivial 
special case, if the alarm is always on (f a = 1), then all 
the earthquakes are "forecasted" (f e = 0). Conversely, 
all the earthquakes are failures to predict if the alarm is 
always off. The random guessing strategy is considered 
as a baseline, so a forecasting procedure makes sense only 
if it gives f a + f e < 1. 

We can use the box model fit to the Parkfield series 
to implement a simple earthquake forecasting strategy. 
The strategy consists in turning on the alarm at a fixed 
value of n time steps after the last big earthquake, and 
maintaining the alarm on until the next one i 32 ^ 44 Then 
the alarm is turned off, and the same strategy is repeated, 
evaluating f a and f e for all the cycles. The result is 

n 

f e (n) = P(n'), (20) 

n'=l 

and 

,. w _ SatJW&^o. (2i, 

22n'=o p ( n ) n 

These relations are illustrated in Fig. [HKa), together with 
L = f a + fe- For each value of N, L(n) has a minimum at 
a specific value of n, n*(N). As can be seen in Fig. [S£a), 
ra* (11) = 20, for which 

f a (n*) = 0.405, f e (n*) = 0.085, L{n) = 0.490. 

(22) 

For the Parkfield sequence, n* corresponds to 

rn* = 14.8 yr. (23) 

If the distribution derived from the box model correctly 
describes the recurrence of large earthquakes at Parkfield, 
the alarm connected at this time since the beginning of 
the cycles and disconnected just after the occurrence of 
each shock would yield the results given in Eq. (f2"2")) . Note 
that this time is substantially smaller than the mean du- 
ration of the cycles, m — 24.62 yr. 

The quality of the model-earthquake forecast also can 
be understood visually by means of an error diagram, 
Fig. [6jb), in which f e is plotted versus f a ^ This kind 
of plot is similar to the receiver operating characteristic 
diagram, used, for example, to test the success of weather 
forecasts^ 



1.0 




fraction of alarm time, fa 

FIG. 6: (a) Fraction of errors {fe), fraction of alarm time 
(fa), and loss function (L = f a + fe) as a function of n for the 
forecasting strategy in a box model with N = 11. (b) Error 
diagram for this strategy. Each point on the curve is the result 
of using a different value of n. The large dot corresponds 
to n*, for which the loss function reaches a minimum. The 
diagonal lines are isolines of L. A random guessing strategy 
would render L = 1. 



VII. SUMMARY 

The generation of large earthquakes in a seismic fault 
involves slow loading of elastic strain (or, conversely, elas- 
tic energy), and release through rupture and/or frictional 
sliding during an earthquake. The duration of this pro- 
cess is the seismic cycle, which is repeated indefinitely, 
leading to a series of recurrent shocks. We have illus- 
trated this process with a very simple model. The load- 
ing of elastic strain is represented by the stochastic filling 
of a box with N cells. The emptying of the box after its 
complete filling is analogous to the generation of a large 
earthquake, in which the fault relaxes after having being 
loaded to its failure threshold. The duration of the filling 
process is thus equivalent to the seismic cycle. 

The statistical distribution of seismic cycles in the box 
model (just as the distributions of the Brownian pas- 
sage time models and the minimalist mode l 31 ' 32 i 33 ) can 
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be used to fit actual earthquake series and to estimate 
earthquake probabilities. The conditional probability of 
the box model has two interesting features. First, after 
a large earthquake, there is a period of stress shadow 
during which a new large earthquake cannot occur. Sec- 
ond, from this time on the probability continuously in- 
creases, approaching a constant asymptotic value. By 
using a simple forecasting strategy, we have shown that 
the earthquakes in the model are predictable to some ex- 
tent. 

We hope that our discussion will be a useful educa- 
tional tool for introducing several important geophysical 
and statistical concepts to graduate and undergraduate 



students. It could illustrate how to make quantitative 
estimates of a natural phenomenon as popular and as 
mystifying as earthquakes. 
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FIG. 1: Corrected version of Fig. 6 of the original article. 



The values of f e plotted in Fig. 6 and used to estimate 
n* were determined by a Monte Carlo simulation of the 
model instead of using Eq. (20). If, at a given time step in 
the simulation, the "alarm" was sounded and the model 
earthquake occurred, the latter was deemed as success- 
fully forecasted. This assumption is incorrect, and leads 
to a value of f e smaller than the true one in Eq. (20). 
Given that n is the number of time steps before sounding 
the alarm, if the earthquake occurs at the nth time step, 
the alarm has still not been sounded, and the earthquake 
should be considered a prediction failure. An earthquake 
in the box model cannot occur before the iVth time step 
of each cycle, so f e = if and only if n < N. This error 
caused f e = also for n = N. 

We give here a revised version of Fig. 6. The correct 
values of f e and L = f a + f e are only slightly higher than 
those previously published. This correction changes the 
value of n* (19 time steps, instead of 20). It also modifies 
the results in Eq. (22), 

f a (n*) = 0.432, f e (n*) = 0.084, L{n*) = 0.516, (1) 

and the value in Eq. (23) for the Parkfield sequence: 

rn* = 14.1yr. (2) 

We apologize for this error and hope that this note 
will serve to clarify the convention for calculating f e with 
discrete probability distributions. 
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This is an informal appendix to the paper "The occupation of a box as a toy model for the seismic 
cycle of a fault" {American Journal of Physics, 73(10), 946-952), where we illustrated how a simple 
statistical model can describe the quasiperiodic occurrence of large earthquakes in a seismic fault. 
This appendix describes some proofs that could not be included in the original paper because of 
their length. Namely, we deduce here: 

(1) the discrete probability distribution for the duration of the seismic cycle in the model; 

(2) the asymptotic mean and standard deviation of that distribution (when the number of cells in 
the model tends to infinity); and 

(3) the asymptotic conditional probability in this model (when the time since the last earthquake 
tends to infinity). 



I. DISCRETE PROBABILITY DISTRIBUTION 
FOR THE DURATION OF THE SEISMIC CYCLE 

The discrete probability distribution for the duration 
of the seismic cycle in the box model was named P/y(n), 
and written in Eq. 14 of the original paper—. 

The box model is a Markov chain^, and this enables 
to deduce P/v(n) by using a technique^ that we already 
applied to the Minimalist Model 4 , which is also Marko- 
vian. A Markov chain is a stochastic process defined by a 
discrete random variable X that 1) can only take a finite 
number of values, and 2) whose value at the next time 
step depends only upon the value at the present time 
step, being independent of the way in which the present 
value arose from its predecessors. In other words, a 
Markov chain has no memory: the evolution of a Markov 
system at any time depends only on the state of the sys- 
tem at that time and not on the history of how the state 
was achieved. 

In a box model with N cells, the state is only deter- 
mined by the number of occupied cells, that here will be 
called v. The succession of values of this random variable 
defines the stochastic process of the box model. Note that 
exactly which cells are occupied is not relevant, but only 
how many of them are. The number of stable states in 
the model is N; in each of them v takes one value in the 
set {0, 1, 2, 3 . . . (N — l)}. If N cells become occupied, the 
system instantly changes to the empty state. It does not 
reside any time step in the state of complete occupancy, 
so this is not a stable state. 

The value of v in the next time step only depends on 
the value of v in the current time step, so it follows the 
definition of a Markov chain. For example, if the system 
is empty (v = 0), in the next time step, for sure (with 



probability equal to 1) it will move to the state of v = 1. 
In this second step the fraction of occupied cells is 1/N, 
and the fraction of empty cells is (N — l)/N. So, in the 
third time step, with probability (N — l)/N another cell 
will be occupied by a ball (y becoming equal to 2), or 
the model will remain in v = 1 with probability l/A^ 
(the probability of the incoming ball landing in the only 
occupied cell). In general, for v < N — l, there is a 
probability (N — v)/N of moving to v + 1 in the next 
time step. If v = N — 1, there is a probability (N — \)/N 
of moving to v = (passing through v = N, but not 
residing any moment there). In each time step there is a 
probability v /N of residing in the same state during the 
next time step. 

As for any other Markov chain, for the box model we 
can define a transition matrix M, a table that contains all 
the transition probabilities of passing, in one time step, 
from any of the states of the system to any of the others 
or to itself. Each element of the matrix will be denoted 
in the standard way as M(i, j), being i the row (from top 
to bottom), and j the column (from left to right). Each 
element gives the probability of moving from the state 
X = i in the time step n to the state X = j in the step 
n + 1: 

M(i,j)=P(X n+1 =j\X n =i). (1) 

The transition matrix of the box model is different for 
each N: as shown above, the transition probabilities de- 
pend on TV, and because there are N stable states, the 
size of the matrix is N x N. Thus we will denote the ma- 
trix for the box model as Mjv- Denoting the occupation 
state with v as above, the element Mjv(z, j) will be the 
transition probability from v = % — Wo v = j — 1: 

M N (i,j) = P{v n+l =j-l\v n =i-l). (2) 
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The cause for this difference in notation is that v ranges 
from to N — 1, while i and j range from 1 to N. 

Let us now deduce the discrete probability distribution 
for the duration of the seismic cycle, using the formalism 
of Markov chains. The discrete distribution Pjv(n) de- 
fines the probability that, for a box model of N cells, 
the seismic cycle lasts n time steps. The seismic cycle 
starts when v — 0, and lasts until v = again. Ex- 
cept for N = 1, which is a trivial, special case of the 
model, there is no possible transition in one time step 
from v = to v — (remember that this impossibility 
causes the stress shadow in the model). Speaking more 
generally, in the first n — 1 time steps of the cycle there 
is no transition to the state v = 0. Because of this, to 
calculate -P/v(n) we will first deduce the probability that 
the system evolves from v = §tov = N — linn — 1 time 
steps, without having passed through v = in between. 
To calculate the probability that the system evolves from 
v = N — 1 to f = is simpler. At the beginning of the 
n-th step the system has v — N — 1 . Then a new particle 
is added to the only one empty cell, so the occupation 
becomes v = N, but instantly drops to v = at the end 
of the step. The transition in the time step is thus from 
v = N — 1 to f = 0. The probability for this to happen 
is 1/N, the chance for the incoming particle to land in 
the only empty cell of the array when v = N — 1. 

Thus, the deduction of P/v(ro) proceeds as follows: 

1. Deduce the probabilities of passing between the 
different states of the system in one time step. 
These transition probabilities will be tabulated in 
the transition matrix Mji. 

2. Remove from Mjv the possibility of intermediate 
transitions to v = 0. The resulting matrix will be 
called M' N . 

3. Calculate the transition probabilities of passing be- 
tween the different states in n — 1 time steps and 
neglecting the possibility of passing through the 
state with v = 0. The result is a new matrix, 
T N = M'l-\ 

4. One of the elements of this matrix will indicate the 
probability of passing from v = 0tov = N — 1 
in n — 1 time steps without having passed through 
v = in between. Multiplying this probability by 
1/N we will obtain P/v(n). 

Let us proceed in this order. The transition matrices 
for N equal to 2, 3 and 4 are as follows: 



and for N 
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All the elements of these matrices are nonnegative (they 
are probabilities) and the sum of all the elements of any 
row is always 1. These two are necessary and sufficient 
properties of transition matrices of Markov chains. These 
matrices show evident regularities, which enable to de- 
duce by inspection that the matrix for any N is 



/ N ... 
1 N-l ... 
2 7V-2 ... o 





Vi o 



N-2 2 
N-lJ 



(6) 



Note that the matrix multiplied by 1/7V has only three 
non-null diagonals, all of them trivial. The first one is 
the sequence N, N — 1,N — 2. ..2, the second one is the 
sequence 0, 1, 2 . . . N — 1, and the third one is only the 
bottom left element, which is always 1. 

To calculate P/v(n) the next step (the second one in the 
list above) is to prune from this matrix the transitions 
to v = 0. The only possible transition to v = is from 
v = N — l, and the probability for this transition is given 
by the bottom left element Mjv(-^V, 1). Nullifying this el- 
ement, the resulting matrix, MV, is particularly simple, 
because it has only two trivial, non-null diagonals: 



M' N = — 

N 



( N ... 
1 N-l ... 
2 N-2 ... 





V o o 
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Now (third step of the list) it is necessary to compute a 
new matrix, T n , which indicates all the transition prob- 
abilities, in n — 1 time steps, between all the states, with 
the restriction that passing from v = N — 1 to 1/ = 
is forbidden. In Markov chains, the m-step transition 
probability matrix is the m-th power of the transition 



matrix 3 . So the new matrix is 



- N 



M 



fll—l 

N • 



(8) 



For N 



for N = 3, 
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This operation is done through the Jordan decom- 
position of M'jv- The element T N (1,N) of this 

(3) matrix is the transition probability from v = to 
v = N — linn — 1 time steps and with the transition 
v = N — 1 — » z/ = forbidden. As the probability of 
passing, in the next time step, from v = N — 1 to i/ = 
is 1/N, P/v(n) is obtained by multiplying that element 

(4) by 1 /N. The results, for N = 2 and N = 3 are as follows. 
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For TV = 2, 



^=| = ^Ef(-V- l (-i) l - i (2-i) 



and for N = 3, 

By inspection, the result for a generic TV is 
1 ^ lYAA 



(9) 



(10) 



(Eq. 14 of the original paper). 
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II. ASYMPTOTIC MEAN OF P N (n) 

The mean duration of the cycle in the box model was 
indicated in Eq. 8 of the original paper: 
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The asymptotic value of this expression can be ob- 
tained considering that 5 



N 1 



— > C + I11TV+ - ( 4 ) , (13) 
i n^oo 2N \NJ ' 



where C ~ 0.5772157 is Euler's constant. Multiplying 
this equation by N we obtain the asymptotic mean of 
P N (n), 



III. ASYMPTOTIC STANDARD DEVIATION 

OF P N (n) 

The variance of P/v(n) was indicated in Eq. 9 of the 
original paper: 
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To simplify the sums, we can change the variable to 
k = N + 1 — i. Because i ranges from 1 to JV, A; will 
range from TV to 1. Then the above equation can be 
rewritten as 

JV N JV o JV 

2_V- l y^1_>^TV 2 TV _ 
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The first sum in the right-hand side of this expression 
can be simplified to 
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Inserting Eq. [13] and this result, Eq. [16] in the limit of 

TV — > oo can be written as 
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(n)jv > TV(C + In TV) + - (14) 

JV^oc 2 



(Eq. 12 of the original paper). The absolute error of this 
approximation is < 0.01 for TV > 9. 



The asymptotic standard deviation is the square root 
of the above equation, 
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Because N — > oo, the term —1/2N 2 can be dropped, so 
the equation can be further simplified to 
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(20) 



(Eq. 13 of the original paper). This approximation has 
an absolute error < 0.01 for N > 3. 

The asymptotic aperiodicity, obtained by dividing 
Eq. [2U| by Eq. has an absolute error < 0.0001 for 
N > 10. 



IV. ASYMPTOTIC CONDITIONAL 
PROBABILITY 

To deduce the asymptotic conditional probability in 
the box model we will first consider the asymptotic value 
of PAr(n) when n — > oo. This value is the first, largest 
term in the sum (when j = 1 in Eq. Ill[) . namely 



P N {n) 



1 

N 



(21) 



where we have denoted a = 1 — 1/N. 

For calculating the asymptotic conditional probability 
we need to deduce the value of the cumulative probability 
distribution, Ajy(n), for that large n. This is easier to do 
by defining the sum 



A 



^(n) = ^a i - 1 = f^. 
f-^ 1 — a 



(22) 



Considering that n is large enough, Pjv(n) can be re- 
placed by its asymptotic value (Eq. l2"Tj) . which is the 
term summed in A' N (n). Then it holds that 



The conditional probability (Eq. 18 of the original pa- 
per) is: 

1 - A N [n - 1) 

Inserting Eqs. [21] to [Ml it results that 
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In the original paper we were interested in the yearly 
conditional probability for the next large earthquake at 
Parkfield. In order to fit the series of previous earth- 
quakes, N was chosen as 11 cells, and one time step cor- 
responded to r = 0.74 years. Substituting these values 
in the above equation, the asymptotic yearly probability 
when a long time has passed since the last large earth- 
quake is 0.11 (the value of 11% cited in the original pa- 
per). 



A N (n- 1) = VP N (n) >l-A' N {n), (23) 
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